DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN
claude.icon
This paper provides a rebuttal to the "DBSCAN Revisited" paper presented at SIGMOD in 2015. The main arguments are as follows:.
1. on the theoretical execution time limits of DBSCAN: 1.
The SIGMOD paper proved that DBSCAN cannot be performed in O(n log n), but that does not mean "computationally difficult" in practical terms
While many alternative methods are Θ(n²) or Θ(n³), DBSCAN is still a valid method for large data
2. problems of the experiment: 1.
The parameters used in the SIGMOD paper (especially epsilon) were inappropriately large
Showed that the original DBSCAN utilizing indexes runs faster when using a more appropriate smaller epsilon
Inadequate preprocessing of the data set, resulting in no significant clustering results
3. advantages of DBSCAN: 1.
Distance functions other than Euclidean distance can be used
Can be combined with various index structures such as R*-tree
Works efficiently in real applications
In conclusion, while the SIGMOD paper is valuable in that it presents a theoretical lower bound, it argues that the claim that "DBSCAN should not be used with large data" is not appropriate and that with the right parameters and indexes, DBSCAN is still a competitive algorithm. We agree.
---